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Abstract 


Biologists frequently conduct experiments which measure the patterns 
of inactivation of bacterial populations after exposure to a lethal environment 
This document discusses a computer program which calculates many of the 
quantities that have proven to be useful in the analysis of such experimental 
data. 
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Office of Space Science and Applications, NASA Headquarters, Washington, 
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A COMPUTERIZED PROGRAM FOR STATISTICAL 
TREATMENT OF BIOLOGICAL DATA 

Introduction 

In the programs now underway in the Planetary Quarantine Department, it is 

frequently necessary to compare subtle changes in the destruction pattern of micro- 

1 2 

organisms. The use of standard pour plate techniques ’ for microbial assay during 
experimentation in some cases yields hundreds of data bits (plate counts).- These must 
be reduced in a way that these successive samples taken during process application 
represent the destruction rate of microorganisms as a consequence of the process. 

This destruction rate is best described by a survivor curve since it relates the number 
of surviving organisms at any time to the sterilization process. The survivor curve 
is usually a y-axis plot of the logarithm of the number of organisms surviving the 
sterilization treatment versus the equivalent process time on the x-axis. This pro- 
cess time versus log of survivors or logarithmic model seems to be the most practical 
representation of data since essentially all thermoradiation and most heat and radia- 
tion sterilization has exhibited the logarithmic order of destruction. Consequently, 
the comparison of treatments can be made on the basis of the slope of the survivor 
curve or the D-value determined from the slope. 

Based on this rationale, a computerized program has been developed to handle 
the statistical aspects of the data reduction. With plate counts of each successive 
sampling periods as an input, the program computes the mean value of the replicate 
plate counts, the variance, standard deviation, upper and lower . 95 confidence inter- 
vals and the coefficient of variation for each sampling interval. Based on the coef- 
ficient of variation values for a sampling period, the dilution or data set exhibiting 
the best values are selected for each period. These best sets are then used in com- 
puting the survivor curve based on a least square fit of the logarithmic model. 

Determination of Survivors 

At any specific sampling period the procedure for assay is as follows: Four 
replicate samples are generally used for each sampling period. Aluminum foils or 
0. 020” thick square planchets are used as a substrate for the test organisms. After 
exposure to the sterilization treatment the substrate material is placed in a beaker 
with 10 ml sterile water and insonated for two minutes to suspend the organisms. 

From this base suspension, measured amounts of the inoculum are transferred to 
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petri dishes or additional dilution blanks as required to result in plate counts be- 
tween 30 and 300 colonies per plate. Within this range, the counts can be accurate, 
and the possibility of interference of the growth of an organism with that of another 
is minimized. 


The determination of viable population from the resultant plate count is made 
as follows: 

Using the arrangement of dilution*. Figure 1, the inoculum from each 
of the four replicate samples for a single time period is plated in du- 
plicate. Consequently, there are eight plates for each sampling period 
at a single level of dilution. Sometimes as many as three dilutions are 
plated out with the best set of data used as the surviving population at 
that sampling period. 


EJ FOIL OR PLANCHET 



1 ml 

O "4" DILUTION 
SURVIVORS ■ 
100,000 x 
PLATE COUNT 

0. 1 ml 

0 


"5" DILUTION 
SURVIVORS ■ 
1, 000, 000 x 
-PLATE-COUNT- 


Figure 1. Sample Assay and Dilution Procedure 


*For consistency in the input data, the total survivor plate counts will be 
assigned on order of dilution of n -l B . 



Statistical Methods 


If we consider a single microorganism of a given type, we see that its loss of 

viability in a lethal environment is a random event. This fact has been explained in 

terms of natural variations between microorganisms brought about, in part, by their 

past history and by the hypothesis that loss of viability is due to the occurrence of 
4 

chemical reactions . In modeling the inactivation of microorganisms, researchers 
have usually attempted to derive expressions for the probability of single spore sur- 
vival as a function of time of exposure to a given environment. 

As we have pointed out earlier, instead of looking at the inactivation of a single 
spore, an experimenter considers the number of survivors in a given population as a 
function of time. We shall let the random variable N(t) be the number of survivors 
at time t and let p(t) represent the probability of single spore survival at time t. The 
model we shall assume defines the conditional probability as 

/ N \ N -k 

Prob. |N(t) = k|N(0) = N q ! = ( 0 1 [p(t)f [1 - p(t)] ° . (1) 

Using the definition of conditional probabilities we have 

oo 

Prob. j N(t) = k} = ^2 Pr °b. |N(0) = N ! Prob. (N(t) = k|N(0) = N } . (2) 

N =0 ° ° 

o 

Combining (1) and (2) yields 

^ /N\ N -k 

Prob. |N(t) = kj = / . (. °) [p(t)f [1 - p(t)] ° Prob. |N(0) = N }. (3) 

N = 0 ' k / 
o 

We are usually interested in the expected value of the number of survivors as a func- 
tion of time. Using the expression (3) it can be shown that 

E(N(t)) = E(N(0)) p(t) . (4) 

This is the basic expression for our model. In particular, the most widely used 
expression for the probability of single spore survival is provided by what is known 
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as the log model. Using this model we would have 


p(t) = 10 


-t/D 


and thus (4) becomes 


E(N(t))= E(N(0)) 10 _t//D . 


( 4 ') 


In this model, D is assumed to be a fixed parameter for a microorganism belonging 
to a homogeneous population. 


Let us return for a moment to the experimental method used and consider the 
quantities we wish to compute for each dilution and each time period. Let us define 


x. .(t „) = number of colonies on plate i of dilution i 

at sampling period i, 

Z = 1, . . . , M 
j = 1 K. 


1 = 1, N 


3 ^ 


where 


M = number of sampling periods 
K = number of dilutions at sampling period i 

Xj 

N = number of plates of dilution j for sampling period l 


and 


t = time of sampling period L (in any units desired). 

fj 

The mean of the plate counts for a particular dilution and sampling period is 


N 


i-A- 


•x,(t„)=^r— x,jtj 


y i N. . „ 

J 3 A i =1 


ij A 


while the variance of the distribution of plate counts can be approximated by the sample 
variance which is given by 
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V 


yv- 


£ (i j 

i=l J 




N . . - 1 


V 


x..(t ) 2 - N. x.(t ) 
^ 13 * ] 4 3 *• 


L=1 


V ' 1 


( 5 ) 


Similarly, the standard deviation is approximated by the sample standard deviation 
which is given by 


s j ( V"V s j ( V 


( 6 ) 


To be more precise, let 0 . (t ) be the variance in the plate counts (this includes 
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natural variation as well as any errors). The sample variance is a random variable 
which depends on the counts of the replicate plates. It can be shown that 

a j ( V = E<s j ( *i» • 

Another desirable quantity for each dilution at each time interval is the confi- 
dence interval for the mean. This confidence interval is given for a particular time 
period by the expression 


tyv- yy 


ks.(t^) ks.(t^) 

VV'T^T’ x 3 ( V + 7^T 


(7) 


It is well known that as the number of samples, N. , becomes larger the para- 

3* 

meter k for the a confidence limit should be chosen as the 100 a/2 percentage point 

of the normal distribution. Thus, for the . 95 confidence, k = 1, 96 if N is large. 

3 X 

Unfortunately, the number of plates of a given dilution at a given sampling time .is. 

usually small. In this case, the 100 a/2 percentage point of the Student's t -distribution 

with N - 1 degrees of freedom is more appropriate for k. Thus we can approximate 
3 Z 

k to sufficient accuracy by 
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k = V , + X 3 t 1 + (X 2 + 3H5X 2 + 1) 

4<N ji - 1( 96(N. t - 1 ) 2 


where X is the 100 a/2 percentage point of the normal distribution. For our . 95 
confidence interval we let X = 1. 96 in (8) to get our k for (7). 

A good measure of the amount of spread in a particular set of data has been 

5 

found to be the relative standard deviation . This is more commonly known as the 
coefficient of variation. For each sampling period and each dilution it is defined to 




fiV 

*j ( V 


In calculating the fit to the data of our * straight line" model, we wish to use the dilu- 
tion at each time period which has the "tightest" data. We shall use the coefficient 
of variation as an index of the spread. Therefore, we let 


X V = x jV 


° 2 <t i ) * b'V 

where J is chosen to minimize C.(t ) for j = 1, . . . , K Let the order of this 

. J *> 

dilution (as defined in Figure 1) be d 

We are now prepared to again consider the problem of applying our model to 
the data. Let 


Y(t A ) = X(t £ ) x 10 


Then Y(t ) is an estimate of E(N(t )). In our model we wish to use Y(t ), 1=1, . . . , M 

JC Xj 

to determine E(N(0)) and D as accurately as possible and to obtain some measurements 
of the statistical variations. Taking the natural log on both sides of (4') we obtain 

log E(N(t)) = log (E(N(0)) + Yt , (10) 
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where 


2.303 
7 D 

With this model in mind, let us consider the equation 

y(t) = a + j3t + e 


(ID 


( 12 ) 


where e is a random variable representing the variation of the measured values about 
the line a + j8t. 

Comparing (10) and (12) we see that we are assuming that 

a = log (E(N(0)) 


or 


and 


a 


E(N(0)) = e 


a 2.303 _ 
e - -5- - t 


(13) 


(14) 


The random variable e in (12) represents the variation of the mean of the plate counts 
from the log model. This is assumed to be independent of time. This is consistent 
with assuming that the distribution of the variation in plate counts from the log model 
is independent of time. Let 

y ^ = log Y(t^) . 

Then y is a sampled value of the random variable log E(N(t )). Let us assume that 

Xj 1 

( is normally distributed and that 

E(e) = 0. 


For later convenience, let the variance of the distribution of e be represented by 
2 

CT . 


We are now prepared to calculate a and /3. The following definitions will prove 


valuable : 



M 

E‘* 

1=1 

M 

M 

E y ‘ 

4=1 



M 


M 



E<* 

/tKy^y) 

E 


3. 

b . 1=1 


i=i 


b M 


M 



E 

“i- 1 > 2 

E 

<v " t)2 


i=i 


i=i 


4. 


a = y - bt 




(15) 


(16) 


(17) 


(18) 


The quantities a and b depend on the samples used. The Gauss-Markoff 
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theorem tells us that a and b coincide with the maximum likelihood estimates of 
a and )3 and that they are unbiased, i. e. , 


and 


Letting 


E(a) = a 
E(b) = P . 
Z(t) = a + bt 


and defining the standard error of estimate, S | t , by 


M 


2 


_ g 2 _ _ J= 1 

y It M - 2 


(19) 


2 2 

the Gauss-Markoff theorem also tells us that S , is an unbiased estimate of a , 
7 ylt e 

i. e. , 


E(S 2 , ) = a 2 
ylt e. 
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In addition to the Gauss-Markoff theorem, our assumptions on e imply that a 
and b are also the minimum variance unbiased estimates of a and /3 respectively 
from among the class of all linear estimates. The standard deviation in b, which is 
given by 


0 


b 



<v » 2 


can be approximated by the standard error in the slope. 



Similarly, the standard deviation of the distribution of a. 



can be approximated by the standard error in a , 


S 

a 




z 


<y-t > 2 


where S 


y|t 


is given by (19). 


It is also desirable in many applications to have a measure of how closely the 
variation in the log of the means of plate counts can be explained on the basis of only 
the variation of time in the lethal environment. The correlation coefficient, r, is 
defined by 


*The quantity S shall be called the standard error in the estimated intercept. 

cl 




Feller proves the following statements can be made concerning r: 

1. |r| * 1 

2. r = ± 1 implies that there exists constants p and 0 such that 

y = pt + 6 (except for a set of lines which have zero probability 
of occurring). 

In addition, it can be shown that if y and t are independent, then r = 0. The 
converse of this statement is not true, however. 

Let us return for a moment to the probability of single spore survival. Most 
microbiologists are interested in the D-value of the population. We have shown that 
we can approximate the D-value by 

D = 1 = 2 - 303 

b log 10 e " b 

In addition, the standard error in the estimated D is given by 

2. 303 S,_ D 
o _ b b 

b D 2 b 

b 

Another feature which it is sometimes desirable to have available is the con- 
fidence band about the curve representing the model. This is also easily computed . 
Let 


I+ V* 


S z (t.) s y|t / M ' M 


E ( v ‘ t)2 

1=1 


Then the upper 95% confidence line is given by 


Z u ( V = a + bt t + kS Z „ > 

Ju 
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and the lower by 


<v 


a + b t , - k S r 


V 


where k is given by the Student's t-distribution of degree M-2. For the .95 confidence 
interval k is approximated by 


k = X 


1 + 


X 2 + l 
4 (M-2) 


(X 2 + 3) (5X 2 + 1) 
96 (M-2) 2 


where X = 1. 96. 


The Program • 

The flow chart for the program is given in Figure 2. This is self-explanatory. 
The input is prepared in the manner illustrated in Figure 3. The output is described 
in Figure 4 using the notation of the previous section. 

Figure 5 provides an example of the input data while Figure 6 gives the output 
from the use of the progtam on this example. 

Finally, a graphical representation of the data, the model, and the . 95 confi- 
dence interval is shown graphically in Figure 7. 
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Figure 2. Program Flow Chart 










CONFIDENCE INTERVAL COMPUTED 
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Figure 3. Input Format 



TITLE OF EXPERIMENT 


<L) 

O 



& 


§ 

to 


pofjad duii) 6u{.|duj?s 
ipw joj. )caday 


*£> 

C 

X 


CT> 

OJ 

tn 

o 


*/> 

c 

•o 

g 

in 

o 


a3 

a 

s* 

o 

"5 

a 

+j 

a 

0) 

Sh 

3) 

•H 

fa 



3 


O 

t- 

OJ 

•*» 
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I 

24 N 0 VEM 8 ER 1970 


8 

0.0 

1 

8 

3 

265 . 

282 . 

297 . 

267 . 

250 . 

277 . 

265 

3.0 

1 

8 

67 . 

58 . 

62 . 

57 . 

51 . 

75 . 

80 . 

6.0 

1 

8 

2 

58 . 

63 . . 

80 . 

83 . 

62 . 

66 * 

75 . 

9.0 

1 

8 

1 

278 . 

212 . 

197 . 

201 . 

214 . 

255 . 

2 36 

12.0 

1 . 

8 

1 

48 . 

37 . 

37 . 

25 . 

25 . 

21 . 

42 . 

1 5.0 

1 

8 

0 

116 . 

135 . 

87 . 

62 . 

95 . 

82 . 

85 . 

18.0 

7 

8 
0 

26 . 

16 . 

12 . 

} 4 . 

1 1 . 

10 . 

18 . 

4 

-1 

167 . 

21.0 

1 

4 

-1 

5 1 . 

1 1 7 . 
60 . 

92 . 

9 ^. 

162 . 

89 . 





285 


72 . 


63 . 


231 


29 . 


8k, 


9 . 


Figure 5. Example of Input Data 



24 NOVEMBER 1970 


DATA SET = 1 

TIME* 0.000 
*0. DIL,= 1 
NUMBER 04TA POIMS= 8 

CRDER OF OIL. * 3 

DATA 

265.00 262.00 292.00 267.00 250.00 277.00 265.00 285.00 

MEAN* 272. 875 VARIANCE® 135.0 S.O.* 13.6 UPPER .95 C.I.* 284.2 LOWER .95 C.I.* 261.5 C V = .0498 

CIL. CHOSEN * 3 

DATA SET = 2 

TIME* 3.000 

NO. OIL.* 1 « 

NLMBER DATA POINTS* 8 

ORDER OP OIL. * 3 

DATA 

67.00 58.00 62.00 57.00 51.00 75.00 80.00 72.00 

MEAN* 65.250 VARIANCE* 99.4 S.D.® 10.0 UPPER .95 C.I.* 73.6 LOWER .95 C.I.* 56.9 C V = .1528 

CIL. CHOSEN * 3 

CATA SET * 3 

TIME* 6.000 
NO. OIL.* l 

NUMBER OATA POINTS* 8 

CROER OF OIL. * 2 

OATA 

58. 00 63.00 80.0 0 83.00 62. OQ 66. OQ 75. QQ 63.QQ 

MEAN* 68.750 VARIANCE* 86.2 S.D.* 9.3 UPPER .95 C.I.* 76,5 LOWER .95 C. I. * 61.0 CV = .1351 

CIL. CHOSEN * 2 

OATA SET * 4 

TIME* 9.000 
NO. CIL.* 1 

NLMBER DATA POINTS* 3 

CRDER OF OIL. * l 

OATA 

276.00 212.00 197.00 201. 00 214.00 255.00 236. 00 

MEAN* 223.000 VARIANCE* 777.7 S.O.* 27.9 UPPER .95 C. I. 

CIL. CHOSEN = 1 

CATA SET = 5 

TIME* 12.000 
NO. OIL.* 1 

NUMBER DATA POINTS* 8 

OROER OF OIL. = 1 

OATA 

48.00 37.00 37.00 25.00 25.00 21.00 42.00 29. QQ 

MEAN* 33. 000 VARIANCE* 69.4 S.O. * 9.5 UPPER .95 C.I.* 40.9 LOWER .95 C.I.* 25.1 CV * .2866 

CIL. CHOSEN = 1 

CATA SET = 6 

TIME* 15.000 
NO, OIL.* 1 — 

NUMBER OATA POINTS* 8 

OROER OF OIL. * 0 

OATA 

116.00 135.00 37.00 62.00 95.00 82.00 65.00 

MEAN* 93.250 VARIANCE* 508.5 S.D.* 22.5 UPPER .95 C.I. 

CIL. CHOSEN * 0 

DATA SET * 7 

TIME* IS. 000 
NO. OIL.* 2 

NUMBER DATA POINTS* 8 

OROER OF OIL. * 0 

OATA 

26.00 16.00 12.00 14.00 11.00 10.00 16.00 

MEAN* 14.500 VARIANCE® 30.9 S.D.® 5.6 UPPER .95 C.I. 

NLMBER DATA POINTS® 4 

CROER OF OIL. * -1 

OATA 

157.00 117.00 92.00 162.00 

MEAN* 132.000 VARIANCE* 1116.7 S.O.® 33.4 UPPER .95 C.I. 

CIL. CHOSEN = ' 1 

OATA SET * e 

TIME* 21.000 
NO. OIL.* 1 

NUMBER DATA POINTS* 4 

OROER OF OIL. = “1 

OATA 

51.J3 0 50.00 , 93 .00 8 9.0 0 

MEAN* 70. 750 VARIANCE® 5«*9. 6 S.O.® 23.4 UPPER .95 C.I.* 106.7 LOWER .95 C.I.* 34.8 CV = .3314 

CIL. CHOSEN = ^1 


SLOPE® -.521 D VALUE* 4.421 INTERCEPT* 2. 3 1 043 0863 9E* 06 

CORR. COEF.* .58016 STAND. ERR. IN EST. SLOPE* .01896 STAND. ERR. OF EST. 

.95 CONF. INTERVAL 

T SAMP MOOEL UPPER 

0. 2.728750QQ00E*Q6 2. 31 Q430B639E* 0 6 4. 1234 7 04598E ♦ 06 

3. 00000000006*00 6. 52500000006*05 4. 84095708706*05 7.74220633096*05 

6. COGQOQQQOOE+OO 6. 875 00 0 DOOOE+04 1. 014307l530E*05 1 .482 04 24 844E ♦ 05 

9. 0000 000 OOOE+Q 0 2. 28000000006*04 2. 1252388364E* 04 2.94 Q6330336E ♦ 04 

1. 2O03OOOO0OE*01 3. 3QQQQQ0QQQE+Q2 4. 452931341 7E*0 3 6. 16 139549846* 03 

1.50000000006*01 9. 3250 Q 00 Q00E*02 9. 3 3 0 05608 3 IE* 0 2 1 . 36 324 97 27 2E ♦ 03 

1. 80000000006*01 1, 320 0 0 0 0 00 OE* 02 1 • 954 8908311E*0 2 3.1264826143E* 02 

2.10000000006*01 7. 075 0 0 0 0 0 0 OE* 0 1 4.09600770626*01 7.3 1 0 22 3839QE* 01 


Figure 6. Example of Output Data 


.13595 STAND. ERR. IN EST. INTER. * .19913 
LOWER 

1.29h5626333E*06 
3. 026697568 5E* 05 
6. 34169 95 170E*04 
1. 5359414317E+04 
3.21819911396*03 
6. 3854732390E*02 
1.22233149296*02 
2.29504314766*01 


64.00 

112.0 LOWER .95 C.I.® 74.5 Cv * .2418 


9.00 

19.1 LOWER .95 C.I.® 9.9 CV = .3831 


= 183.2 LOWER .95 C.I.* 80.8 CV * .2532 


231.00 

= 251.2 LOWER .95 C.I.* 204.3 CV = .1223 
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SURVIVORS 



Figure 7. Graphical Representation 
of Program Output 
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